Learning Filterbanks from Raw Speech for Phone Recognition

نویسندگان

  • Neil Zeghidour
  • Nicolas Usunier
  • Iasonas Kokkinos
  • Thomas Schatz
  • Gabriel Synnaeve
  • Emmanuel Dupoux
چکیده

We train a bank of complex filters that operates on the raw waveform and is fed into a convolutional neural network for end-to-end phone recognition. These time-domain filterbanks (TD-filterbanks) are initialized as an approximation of melfilterbanks, and then fine-tuned jointly with the remaining convolutional architecture. We perform phone recognition experiments on TIMIT and show that for several architectures, models trained on TD-filterbanks consistently outperform their counterparts trained on comparable mel-filterbanks. We get our best performance by learning all front-end steps, from pre-emphasis up to averaging. Finally, we observe that the filters at convergence have an asymmetric impulse response, and that some of them remain almost analytic.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A deep scattering spectrum - Deep Siamese network pipeline for unsupervised acoustic modeling

Recent work has explored deep architectures for learning acoustic features in an unsupervised or weakly-supervised way for phone recognition. Here we investigate the role of the input features, and in particular we test whether standard mel-scaled filterbanks could be replaced by inherently richer representations, such as derived from an analytic scattering spectrum. We use a Siamese network us...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

Comparison of IIR Filterbanks and FFT Filterbanks in Cochlear Implant Speech Processing Strategies

The first stage of all cochlear implant speech processing strategies is a filterbank which decomposes the speech signal into several frequency bands. Two types of filterbanks can be used to achieve this goal and mimic the spectral analysis performed within the cochlea. The first is based on the Fast Fourier Transform (FFT). The second type of filterbank is composed of Infinite Impulse Response ...

متن کامل

Design of Detectors for Automatic Speech Recognition

This thesis presents methods and results for optimizing subword detectors in continuous speech. Speech detectors are useful within areas like detectionbased ASR, pronunciation training, phonetic analysis, word spotting, etc. Firstly, we propose a structure suitable for subword detection. This structure is based on the standard HMM framework, but in each detector the MFCC feature extractor and t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1711.01161  شماره 

صفحات  -

تاریخ انتشار 2017